We’re partnering with a leading software company seeking an experienced Site Reliability Engineer (SRE) to help scale and support their Azure-hosted SaaS platform.
You’ll play a key role in ensuring high system reliability, performance, and observability — working across Dev, Ops, and Infrastructure teams to maintain a world-class service.
Maintain high availability and reliability of Azure-based services
Develop and enhance monitoring, alerting, and observability tools
Automate provisioning, deployments, scaling, and incident response
Lead incident management and drive post-incident improvements
Build infrastructure through IaC tools such as ARM, Bicep, or Terraform
Optimize performance and ensure compliance with ISO27001, SOC 2, and GDPR standards
Proven experience in a SaaS or software product environment
Strong background in Microsoft Azure infrastructure and services
Proficient in scripting/automation (PowerShell preferred)
Experience with monitoring tools (Azure Monitor, Grafana, Prometheus, Datadog)
Knowledge of containers (Docker/Kubernetes) and CI/CD pipelines
Skilled in incident response and root cause analysis
For more info contact Seamus at Reperio or apply through the link
Reperio Human Capital acts as an Employment Agency and an Employment Business.